home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga Plus Special 25
/
AMIGAplus Sonderheft 25 (2000)(Falke)(DE)(Track 1 of 4)[!].iso
/
Updates
/
PowerPC
/
bzip2-0.1pl2
/
bzip2.1.preformatted
< prev
next >
Wrap
Text File
|
1998-08-03
|
21KB
|
466 lines
bzip2(1) bzip2(1)
NNAAMMEE
bzip2, bunzip2 - a block-sorting file compressor, v0.1
bzip2recover - recovers data from damaged bzip2 files
SSYYNNOOPPSSIISS
bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
DDEESSCCRRIIPPTTIIOONN
_B_z_i_p_2 compresses files using the Burrows-Wheeler block-
sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of sta-
tistical compressors.
The command-line options are deliberately very similar to
those of _G_N_U _G_z_i_p_, but they are not identical.
_B_z_i_p_2 expects a list of file names to accompany the com-
mand-line flags. Each file is replaced by a compressed
version of itself, with the name "original_name.bz2".
Each compressed file has the same modification date and
permissions as the corresponding original, so that these
properties can be correctly restored at decompression
time. File name handling is naive in the sense that there
is no mechanism for preserving original file names, per-
missions and dates in filesystems which lack these con-
cepts, or have serious file name length restrictions, such
as MS-DOS.
_B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if
you want this to happen, you should delete them first.
If no file names are specified, _b_z_i_p_2 compresses from
standard input to standard output. In this case, _b_z_i_p_2
will decline to write compressed output to a terminal, as
this would be entirely incomprehensible and therefore
pointless.
_B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec-
ified files whose names end in ".bz2". Files without this
suffix are ignored. Again, supplying no filenames causes
decompression from standard input to standard output.
You can also compress or decompress files to the standard
output by giving the -c flag. You can decompress multiple
files like this, but you may only compress a single file
this way, since it would otherwise be difficult to sepa-
rate out the compressed representations of the original
files.
1
bzip2(1) bzip2(1)
Compression is always performed, even if the compressed
file is slightly larger than the original. Files of less
than about one hundred bytes tend to get larger, since the
compression mechanism has a constant overhead in the
region of 50 bytes. Random data (including the output of
most file compressors) is coded at about 8.05 bits per
byte, giving an expansion of around 0.5%.
As a self-check for your protection, _b_z_i_p_2 uses 32-bit
CRCs to make sure that the decompressed version of a file
is identical to the original. This guards against corrup-
tion of the compressed data, and against undetected bugs
in _b_z_i_p_2 (hopefully very unlikely). The chances of data
corruption going undetected is microscopic, about one
chance in four billion for each file processed. Be aware,
though, that the check occurs upon decompression, so it
can only tell you that that something is wrong. It can't
help you recover the original uncompressed data. You can
use _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged
files.
Return values: 0 for a normal exit, 1 for environmental
problems (file not found, invalid flags, I/O errors, &c),
2 to indicate a corrupt compressed file, 3 for an internal
consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
_B_z_i_p_2 compresses large files in blocks. The block size
affects both the compression ratio achieved, and the
amount of memory needed both for compression and decom-
pression. The flags -1 through -9 specify the block size
to be 100,000 bytes through 900,000 bytes (the default)
respectively. At decompression-time, the block size used
for compression is read from the header of the compressed
file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
to decompress the file. Since block sizes are stored in
compressed files, it follows that the flags -1 to -9 are
irrelevant to and so ignored during decompression. Com-
pression and decompression requirements, in bytes, can be
estimated as:
Compression: 400k + ( 7 x block size )
Decompression: 100k + ( 5 x block size ), or
100k + ( 2.5 x block size )
Larger block sizes give rapidly diminishing marginal
returns; most of the compression comes from the first two
or three hundred k of block size, a fact worth bearing in
mind when using _b_z_i_p_2 on small machines. It is also
important to appreciate that the decompression memory
requirement is set at compression-time by the choice of
block size.
2
bzip2(1) bzip2(1)
For files compressed with the default 900k block size,
_b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To
support decompression of any file on a 4 megabyte machine,
_b_u_n_z_i_p_2 has an option to decompress using approximately
half this amount of memory, about 2300 kbytes. Decompres-
sion speed is also halved, so you should use this option
only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory con-
straints allow, since that maximises the compression
achieved. Compression and decompression speed are virtu-
ally unaffected by block size.
Another significant point applies to files which fit in a
single block -- that means most files you'd encounter
using a large block size. The amount of real memory
touched is proportional to the size of the file, since the
file is smaller than a block. For example, compressing a
file 20,000 bytes long with the flag -9 will cause the
compressor to allocate around 6700k of memory, but only
touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
decompressor will allocate 4600k but only touch 100k +
20000 * 5 = 200 kbytes.
Here is a table which summarises the maximum memory usage
for different block sizes. Also recorded is the total
compressed size for 14 files of the Calgary Text Compres-
sion Corpus totalling 3,141,622 bytes. This column gives
some feel for how compression varies with block size.
These figures tend to understate the advantage of larger
block sizes for larger files, since the Corpus is domi-
nated by smaller files.
Compress Decompress Decompress Corpus
Flag usage usage -s usage Size
-1 1100k 600k 350k 914704
-2 1800k 1100k 600k